INDEX
Explanations
words related to time periods, particularly decades
New Auto-Interp
Negative Logits
房
-0.74
la
-0.69
e
-0.68
y
-0.66
er
-0.65
a
-0.65
sam
-0.65
Angelina
-0.65
心
-0.64
tab
-0.64
POSITIVE LOGITS
jectures
1.10
etheless
1.08
Eſ
1.04
theless
1.02
ſelf
0.98
Cyfeiriadau
0.96
myſelf
0.96
0.96
ſeveral
0.95
whoſe
0.94
Activations Density 0.122%