INDEX
Explanations
names of people and places, particularly in a political context
New Auto-Interp
Negative Logits
myſelf
-0.98
pleaſure
-0.93
ſtate
-0.92
ſelf
-0.92
itſelf
-0.90
houſe
-0.90
faſt
-0.88
Efq
-0.86
raiſ
-0.86
purpoſe
-0.85
POSITIVE LOGITS
aarrggbb
0.58
,
0.51
u
0.48
m
0.48
/
0.47
wa
0.47
?
0.47
i
0.46
y
0.46
im
0.46
Activations Density 0.016%