INDEX
Explanations
mention of well-known entities or figures
references to well-known individuals or entities
New Auto-Interp
Negative Logits
plet
-1.07
onies
-0.85
ossession
-0.85
apons
-0.84
otos
-0.83
otion
-0.83
xit
-0.81
cair
-0.79
otor
-0.77
antha
-0.76
POSITIVE LOGITS
tale
0.80
itarian
0.77
landmarks
0.72
ties
0.72
stood
0.70
Famous
0.68
wealth
0.66
ness
0.65
Offline
0.64
falsehood
0.64
Activations Density 0.017%