INDEX
Explanations
phrases related to statements made by authorities or officials
instances of reported speech
New Auto-Interp
Negative Logits
ptives
-0.69
âĸ¬âĸ¬
-0.67
obyl
-0.66
âĸijâĸij
-0.66
rafted
-0.64
oil
-0.64
Fuck
-0.61
ement
-0.60
Lot
-0.59
olon
-0.59
POSITIVE LOGITS
enz
0.71
doms
0.66
sth
0.64
itud
0.62
Fargo
0.61
aides
0.58
dism
0.58
indo
0.58
anecd
0.57
è£ıè
0.57
Activations Density 0.172%