INDEX
Explanations
proper names, especially those repeated multiple times
names of people and entities
New Auto-Interp
Negative Logits
road
-0.69
door
-0.68
gar
-0.66
ral
-0.65
locked
-0.65
treadmill
-0.65
spin
-0.65
loaded
-0.64
stall
-0.63
count
-0.63
POSITIVE LOGITS
agne
0.91
ENS
0.89
éĹĺ
0.89
soDeliveryDate
0.87
ensen
0.86
enstein
0.85
\<
0.81
enegger
0.81
aints
0.79
ichick
0.79
Activations Density 0.018%