INDEX
Explanations
proper nouns and names, particularly related to movies, actors, politicians, and places
New Auto-Interp
Negative Logits
rane
-0.73
mble
-0.61
Bowen
-0.59
erella
-0.59
critically
-0.59
aspers
-0.59
Ͻ
-0.58
ollen
-0.58
ãĥ¼ãĥĨ
-0.57
Perspect
-0.56
POSITIVE LOGITS
whatsoever
1.35
nor
0.82
brainer
0.76
ody
0.72
onsense
0.71
*/(
0.71
hawk
0.66
anymore
0.66
dime
0.65
hesitation
0.64
Activations Density 0.093%