INDEX
Explanations
words associated with controversy or significant reactions
New Auto-Interp
Negative Logits
apult
-0.15
pokoj
-0.14
DISCLAIMER
-0.14
ihilation
-0.14
lops
-0.13
erg
-0.13
laz
-0.13
spor
-0.13
Mey
-0.13
é¨ĵ
-0.13
POSITIVE LOGITS
comm
0.41
stir
0.37
fur
0.36
hull
0.35
fuss
0.32
ker
0.32
hue
0.31
hub
0.30
fur
0.29
hoop
0.29
Activations Density 0.173%