INDEX
Explanations
instances of societal or political commentary
New Auto-Interp
Negative Logits
plusplus
-0.17
ambiguous
-0.15
"description
-0.15
ãĤĽ
-0.15
åŃĺäºİ
-0.14
otton
-0.14
alendar
-0.14
ausible
-0.13
ipur
-0.13
ypo
-0.13
POSITIVE LOGITS
shows
0.36
indicates
0.34
show
0.33
indicate
0.33
demonstrate
0.32
demonstrates
0.32
speaks
0.32
att
0.30
speak
0.29
testify
0.28
Activations Density 0.614%