INDEX
Explanations
words and phrases associated with influential figures and their impact in various contexts
New Auto-Interp
Negative Logits
itſelf
-1.62
houſe
-1.55
purpoſe
-1.55
pleaſure
-1.53
ſtate
-1.49
myſelf
-1.44
Houſe
-1.42
ſche
-1.38
iſt
-1.37
ſever
-1.36
POSITIVE LOGITS
(
1.16
,
1.09
1.07
:
0.99
/
0.97
"
0.96
-
0.95
-
0.92
?
0.90
;
0.89
Activations Density 7.348%