INDEX
Explanations
names of political figures and locations
New Auto-Interp
Negative Logits
etheless
-0.81
sight
-0.70
directions
-0.69
ãģį
-0.68
limited
-0.68
fringe
-0.66
stroke
-0.65
cipline
-0.65
pinch
-0.65
lihood
-0.65
POSITIVE LOGITS
isha
1.17
ona
1.10
amon
1.05
onda
1.02
istar
1.00
lem
0.99
anta
0.98
ira
0.97
ussia
0.97
iba
0.97
Activations Density 1.277%