INDEX
Explanations
mentions of something being special in some context
the concept of "special."
New Auto-Interp
Negative Logits
Twain
-0.69
Ri
-0.66
anon
-0.65
Giul
-0.65
Cah
-0.63
Ķ
-0.63
Cao
-0.62
amus
-0.61
Į
-0.60
·
-0.60
POSITIVE LOGITS
ised
1.15
ties
1.00
izations
0.96
isations
0.95
isable
0.90
ized
0.89
marine
0.84
ities
0.83
isal
0.81
atural
0.81
Activations Density 0.022%