INDEX
Explanations
mentions of specific names and terms associated with notable individuals and places
New Auto-Interp
Negative Logits
IMP
-0.77
Force
-0.76
IB
-0.76
jails
-0.76
retarded
-0.75
FML
-0.74
DEF
-0.74
indo
-0.72
IM
-0.69
force
-0.68
POSITIVE LOGITS
choes
1.04
atha
0.99
ét
0.97
rawl
0.97
ĸļ
0.95
onial
0.93
ón
0.91
ĸļ士
0.91
enei
0.91
yna
0.89
Activations Density 0.111%