INDEX
Explanations
words related to personal opinions or declarations
phrases related to statements of denial or negation
New Auto-Interp
Negative Logits
alore
-0.69
ammy
-0.66
iage
-0.65
fortun
-0.64
pressures
-0.64
Gutenberg
-0.63
scattering
-0.62
anmar
-0.62
presumed
-0.61
pse
-0.60
POSITIVE LOGITS
ï¸ı
0.86
İ
0.82
VICE
0.80
hall
0.78
ï¸
0.76
hood
0.74
worthiness
0.73
ÙĦ
0.72
Balt
0.72
sure
0.72
Activations Density 0.293%