INDEX
Explanations
phrases or terms related to specific companies or organizations
proper nouns, particularly names and organizations
New Auto-Interp
Negative Logits
ments
-1.08
uably
-0.95
ously
-0.94
istically
-0.92
istics
-0.85
ania
-0.82
ancy
-0.78
naire
-0.77
chest
-0.76
mentation
-0.75
POSITIVE LOGITS
ignt
0.78
hao
0.76
hops
0.75
meier
0.74
mathemat
0.71
alez
0.71
nesday
0.66
abee
0.66
kj
0.65
pher
0.63
Activations Density 0.062%