INDEX
Explanations
proper names related to politics and film
proper nouns, specifically names or titles
New Auto-Interp
Negative Logits
stack
-0.74
Joined
-0.66
athering
-0.65
mantle
-0.65
vidia
-0.64
matter
-0.61
behave
-0.60
iveness
-0.59
guilt
-0.59
disk
-0.58
POSITIVE LOGITS
oother
0.83
iverpool
0.82
akeru
0.73
anwhile
0.70
BRE
0.69
çļ
0.69
adesh
0.68
bott
0.68
OHN
0.67
FK
0.66
Activations Density 0.000%