INDEX
Explanations
the word "one" in various contexts
New Auto-Interp
Negative Logits
uits
-0.89
its
-0.81
ourses
-0.77
models
-0.77
ooks
-0.74
folk
-0.73
Parties
-0.71
acements
-0.71
ories
-0.71
items
-0.71
POSITIVE LOGITS
apiece
0.96
unnamed
0.79
hundred
0.79
else
0.68
person
0.67
single
0.67
unidentified
0.67
Hundred
0.67
observer
0.66
sided
0.65
Activations Density 0.054%