INDEX
Explanations
actions involving physical aggression or conflict
instances of the word "and."
New Auto-Interp
Negative Logits
onymous
-0.82
éŃĶ
-0.78
:(
-0.76
:[
-0.74
Ĥİ
-0.73
ESE
-0.72
uthor
-0.70
Student
-0.70
BIL
-0.70
Commission
-0.69
POSITIVE LOGITS
thus
0.92
consequently
0.92
therefore
0.88
assorted
0.86
hence
0.85
then
0.83
possibly
0.81
flats
0.79
vice
0.79
rogens
0.78
Activations Density 0.884%