INDEX
Explanations
locations or places
punctuation marks, primarily periods
New Auto-Interp
Negative Logits
endeav
-0.79
neglect
-0.79
mathemat
-0.78
unexplained
-0.78
plaus
-0.77
persuasion
-0.77
bending
-0.75
modelling
-0.75
stubborn
-0.74
inertia
-0.74
POSITIVE LOGITS
Additionally
1.37
Tickets
1.34
Tickets
1.19
Additionally
1.12
<|endoftext|>
1.08
Additional
1.07
NOTE
1.06
Previously
1.02
Also
1.02
Fans
1.00
Activations Density 0.304%