INDEX
Explanations
terms and phrases related to opposition or being against something
New Auto-Interp
Negative Logits
#
-0.16
/testify
-0.16
ican
-0.14
akis
-0.14
ai
-0.14
одÑĥ
-0.14
panies
-0.13
odu
-0.13
berger
-0.13
stag
-0.13
POSITIVE LOGITS
(er
0.15
-in
0.15
inue
0.14
iphery
0.14
ụn
0.14
uild
0.14
-
0.14
-through
0.14
ilter
0.14
ONTAL
0.14
Activations Density 0.187%