INDEX
Explanations
phrases related to evidence or proof
negative phrases related to validity or evidence
New Auto-Interp
Negative Logits
Klux
-0.79
Dos
-0.78
Norris
-0.67
AVG
-0.63
Typhoon
-0.58
Dame
-0.58
iper
-0.58
subpoen
-0.57
Kru
-0.57
sts
-0.57
POSITIVE LOGITS
based
1.16
driven
0.99
laden
0.98
oriented
0.98
bearing
0.95
matter
0.94
seeking
0.93
of
0.92
heavy
0.92
rich
0.91
Activations Density 0.078%