INDEX
Explanations
phrases related to rules, regulations, or restrictions
terms related to restrictions or prohibitions
New Auto-Interp
Negative Logits
eon
-0.83
eah
-0.74
itched
-0.69
Balkans
-0.68
neau
-0.66
lycer
-0.66
tops
-0.65
ilant
-0.62
ipel
-0.62
tone
-0.61
POSITIVE LOGITS
tampering
0.78
untarily
0.77
idding
0.75
gambling
0.74
smoking
0.74
bidden
0.74
substances
0.72
misuse
0.69
prohibited
0.69
¿½
0.68
Activations Density 0.056%