INDEX
Explanations
fruit-related nouns and adjectives
references to music, flavors, and preferences
New Auto-Interp
Negative Logits
âĢł
-0.74
ãĥ©ãĥ³
-0.71
ãĤ¿
-0.66
accompan
-0.66
âĹ¼
-0.65
å£
-0.64
éŃĶ
-0.63
æł
-0.62
áµ
-0.60
âĢ¢âĢ¢âĢ¢âĢ¢
-0.60
POSITIVE LOGITS
anymore
1.45
either
1.42
nor
1.15
whatsoever
1.15
slightest
1.14
yet
1.12
nor
1.12
anyway
1.01
:(
1.00
necessarily
0.97
Activations Density 0.478%