INDEX
Explanations
opinions or beliefs expressed by individuals
references to people's opinions or beliefs
New Auto-Interp
Negative Logits
clad
-0.75
ague
-0.68
clad
-0.64
iona
-0.64
Adin
-0.63
çĦ
-0.62
eding
-0.62
ORPG
-0.61
conservancy
-0.61
Peninsula
-0.61
POSITIVE LOGITS
provoking
0.78
aloud
0.75
differently
0.72
ij士
0.70
ileaks
0.68
about
0.68
IUM
0.65
ventus
0.65
alike
0.65
wrongly
0.64
Activations Density 0.075%