INDEX
Explanations
references to the name "George."
New Auto-Interp
Negative Logits
certain
-0.19
argas
-0.18
threshold
-0.18
Certain
-0.17
orca
-0.16
ORS
-0.16
thresholds
-0.15
threshold
-0.15
thew
-0.15
orda
-0.15
POSITIVE LOGITS
aint
0.18
elts
0.17
anna
0.16
lbrace
0.16
Washington
0.16
olid
0.16
hausen
0.16
oj
0.16
urm
0.16
Orwell
0.15
Activations Density 0.020%