INDEX
Explanations
specific verb phrases indicating actions and states
New Auto-Interp
Negative Logits
ingu
-0.15
eg
-0.15
Fuck
-0.15
ibur
-0.14
ï½ŀ
-0.14
æĸ°èģŀ
-0.14
idan
-0.14
fuck
-0.14
opr
-0.14
ibu
-0.14
POSITIVE LOGITS
ï¸
0.14
Nich
0.14
å·Ŀ
0.13
.acc
0.13
rek
0.13
alphabetical
0.13
Dud
0.13
anders
0.13
.googleapis
0.13
Fits
0.13
Activations Density 0.000%