INDEX
Explanations
proper nouns
proper nouns, especially names of individuals
New Auto-Interp
Negative Logits
CLASSIFIED
-0.73
¶ħ
-0.69
Ń·
-0.66
\'
-0.64
Âł Âł
-0.63
İĭ
-0.63
Wilderness
-0.63
---------
-0.62
Disneyland
-0.62
..........
-0.62
POSITIVE LOGITS
mort
0.86
sin
0.84
inf
0.79
make
0.77
top
0.76
lip
0.76
v
0.76
win
0.76
hex
0.74
mor
0.74
Activations Density 0.276%