INDEX
Explanations
words related to restrictions or prohibitions
words related to negative or limiting conditions
New Auto-Interp
Negative Logits
mble
-0.67
ãĥ¼ãĥĨ
-0.63
iffin
-0.61
Enhancement
-0.60
etts
-0.57
rane
-0.57
irez
-0.57
oult
-0.56
conver
-0.54
arbon
-0.54
POSITIVE LOGITS
whatsoever
0.94
urnal
0.92
onsense
0.80
mber
0.76
ilings
0.70
hawk
0.68
Chomsky
0.66
omi
0.65
Pradesh
0.64
ody
0.63
Activations Density 0.070%