INDEX
Explanations
instances of resistance or opposition in various contexts
New Auto-Interp
Negative Logits
alous
-0.17
ntag
-0.17
efully
-0.15
rell
-0.14
uncio
-0.14
ittal
-0.14
ạn
-0.14
éĿ©
-0.14
esture
-0.13
ırak
-0.13
POSITIVE LOGITS
back
0.56
back
0.53
-back
0.43
Back
0.41
_back
0.40
BACK
0.39
.back
0.39
Back
0.38
BACK
0.34
zurück
0.34
Activations Density 0.019%