INDEX
Explanations
phrases related to social, political, and personal interactions
punctuation marks and grammatical structures within the text
New Auto-Interp
Negative Logits
erver
-0.90
729
-0.87
uliffe
-0.84
ongyang
-0.84
ŃĶ
-0.83
rabbit
-0.82
okane
-0.80
rabbits
-0.80
adelphia
-0.79
oultry
-0.79
POSITIVE LOGITS
Sand
1.29
Sand
1.08
Sard
1.05
Sed
1.00
Alexand
1.00
Piet
0.99
Sands
0.98
Ast
0.97
Spect
0.97
Cent
0.96
Activations Density 0.493%