INDEX
Explanations
negative expressions or rejections of ideas
New Auto-Interp
Negative Logits
hel
-0.16
ricks
-0.16
noxious
-0.15
Contribution
-0.14
off
-0.14
IELDS
-0.14
fair
-0.13
maximal
-0.13
Hel
-0.13
bung
-0.13
POSITIVE LOGITS
adan
0.16
ze
0.15
_mE
0.15
ستÙĩ
0.15
HeaderCode
0.15
ازÙĩ
0.14
á»ķ
0.14
ãĥĥãĤ«ãĥ¼
0.14
MyBase
0.14
æŁ±
0.14
Activations Density 0.105%