INDEX
Explanations
phrases that indicate legal status or citizenship
New Auto-Interp
Negative Logits
cer
-0.68
KEY
-0.68
then
-0.68
chie
-0.65
ck
-0.62
LC
-0.61
dt
-0.60
call
-0.60
key
-0.59
キ
-0.58
POSITIVE LOGITS
osuke
0.74
olitan
0.69
ylum
0.66
agher
0.66
inous
0.64
neglig
0.63
Released
0.62
uana
0.61
endon
0.61
dies
0.61
Activations Density 0.248%