INDEX
Explanations
phrases related to making statements or declarations
references to significant events or statements made by public figures
New Auto-Interp
Negative Logits
helm
-0.67
blogspot
-0.64
ufact
-0.61
DEN
-0.61
MN
-0.61
ATED
-0.60
WER
-0.60
ãĥ¡
-0.59
distingu
-0.58
destro
-0.58
POSITIVE LOGITS
bluff
1.04
hotline
0.85
Cth
0.70
kettle
0.67
ugly
0.64
Behavior
0.64
SourceFile
0.62
derogatory
0.60
NCT
0.60
Deity
0.58
Activations Density 0.133%