INDEX
Explanations
the word "Lincoln"
specific names or identifiers related to magnets
New Auto-Interp
Negative Logits
eele
-0.71
needless
-0.67
KP
-0.63
Laugh
-0.63
comrade
-0.62
sarc
-0.62
misunderstand
-0.62
misunderstanding
-0.62
colleague
-0.62
Mountain
-0.61
POSITIVE LOGITS
Ŀ
1.55
oln
1.06
¦
0.86
ħĭ
0.81
ypes
0.78
ģ
0.76
nen
0.74
zzle
0.73
ī
0.72
esta
0.70
Activations Density 0.000%