INDEX
Explanations
phrases related to collective experiences and observations
New Auto-Interp
Negative Logits
iva
-0.16
ia
-0.14
ve
-0.14
rak
-0.14
Overall
-0.14
uz
-0.14
copyright
-0.14
fak
-0.14
erek
-0.13
ween
-0.13
POSITIVE LOGITS
PERT
0.17
even
0.15
باش
0.14
ãĥ¬ãĥĥãĥĪ
0.14
even
0.14
ashboard
0.13
شر
0.13
даже
0.13
596
0.13
_LARGE
0.13
Activations Density 0.049%