INDEX
Explanations
discussions around political and social issues
New Auto-Interp
Negative Logits
contain
-0.16
.Italic
-0.16
NK
-0.15
Ngh
-0.15
consist
-0.15
expire
-0.15
obra
-0.14
olumn
-0.14
jourd
-0.14
beit
-0.14
POSITIVE LOGITS
character
0.36
character
0.32
characterize
0.30
Character
0.29
accompany
0.27
Character
0.27
accompanies
0.27
karakter
0.26
CHARACTER
0.25
caratter
0.25
Activations Density 0.251%