INDEX
Explanations
proper nouns like names of people and places
numerical data and measurements related to events or issues
New Auto-Interp
Negative Logits
persuasion
-0.81
purs
-0.73
dra
-0.68
incent
-0.68
ability
-0.67
renegoti
-0.66
circuits
-0.66
smoot
-0.64
iga
-0.64
afterlife
-0.62
POSITIVE LOGITS
Interested
1.21
Writing
1.16
According
1.16
Specifically
1.11
Speaking
1.08
Sources
1.07
RELATED
1.06
While
1.05
Known
1.03
The
1.02
Activations Density 0.433%