INDEX
Explanations
official titles or positions held by individuals
terms related to important choices or options
New Auto-Interp
Negative Logits
hen
-0.68
cius
-0.68
sv
-0.62
visible
-0.61
482
-0.61
athi
-0.61
Gleaming
-0.61
zek
-0.60
Dash
-0.59
ku
-0.58
POSITIVE LOGITS
for
1.22
for
1.09
FOR
1.01
FOR
0.96
For
0.89
For
0.85
abase
0.69
fore
0.68
fort
0.67
against
0.67
Activations Density 0.175%