INDEX
Explanations
phrases related to permission or freedom to act as one desires
expressions of desire or permission
New Auto-Interp
Negative Logits
ricks
-0.73
ynski
-0.72
Berk
-0.67
riot
-0.64
errors
-0.62
bug
-0.60
riots
-0.58
utenant
-0.58
enthusi
-0.58
hero
-0.57
POSITIVE LOGITS
to
0.71
urities
0.69
GB
0.65
edIn
0.63
mate
0.63
quotas
0.61
awaru
0.61
htar
0.60
uld
0.60
atra
0.58
Activations Density 0.078%