INDEX
Explanations
references to specific years or historical events
New Auto-Interp
Negative Logits
å´
-0.16
leigh
-0.14
fg
-0.14
olumbia
-0.14
Reform
-0.13
ơi
-0.13
oretical
-0.13
nt
-0.13
elo
-0.13
zl
-0.13
POSITIVE LOGITS
s
0.26
sWith
0.18
ies
0.18
sik
0.15
swith
0.15
sı
0.15
serrat
0.15
sled
0.15
sand
0.15
-Ñħ
0.15
Activations Density 0.034%