INDEX
Explanations
words related to legal or political matters
references to financial transactions or economic activities
New Auto-Interp
Negative Logits
Transcript
-0.68
UID
-0.66
STA
-0.63
Wrong
-0.63
ESE
-0.63
CoC
-0.61
Sta
-0.58
Organizations
-0.57
THESE
-0.56
THIS
-0.56
POSITIVE LOGITS
theirs
1.05
likewise
0.81
langu
0.76
elsewhere
0.68
hers
0.67
cill
0.67
outright
0.66
unaffected
0.65
iser
0.65
equally
0.65
Activations Density 0.609%