INDEX
Explanations
words related to actions that imply judgment or evaluation
New Auto-Interp
Negative Logits
anny
-0.16
èĢħãģ®
-0.14
prm
-0.14
amd
-0.14
gles
-0.13
fx
-0.13
Sher
-0.13
ÛĮÚ©
-0.13
aviest
-0.13
vt
-0.13
POSITIVE LOGITS
ed
1.90
edBy
1.05
edb
0.90
edn
0.87
edl
0.85
edm
0.72
ED
0.72
edir
0.67
edata
0.66
edd
0.63
Activations Density 0.424%