INDEX
Explanations
words related to historical events or figures
references to historical topics or figures
New Auto-Interp
Negative Logits
holder
-0.69
FCC
-0.69
Wonderland
-0.67
thumbs
-0.65
BOX
-0.64
learning
-0.64
loophole
-0.63
OMG
-0.63
loose
-0.61
MO
-0.61
POSITIVE LOGITS
orically
1.77
orical
1.66
oric
1.58
orians
1.41
orian
1.38
oria
1.27
adr
1.21
oire
1.14
orie
1.14
ori
1.14
Activations Density 0.039%