INDEX
Explanations
words related to moral values and principles
plural nouns or adjectives and words related to groups or categories
New Auto-Interp
Negative Logits
PET
-0.79
amba
-0.70
Tex
-0.66
INO
-0.66
UL
-0.63
UCK
-0.62
window
-0.61
LECT
-0.60
ATED
-0.59
Sharp
-0.59
POSITIVE LOGITS
gemony
0.90
ashtra
0.80
rahim
0.78
apego
0.78
ndra
0.73
brids
0.73
ths
0.72
ilege
0.71
zsche
0.70
andum
0.70
Activations Density 0.221%