INDEX
Explanations
proper nouns
keywords related to specific individuals and their influence or actions
New Auto-Interp
Negative Logits
logical
-0.70
ש
-0.70
é¾
-0.68
Bulg
-0.65
Grid
-0.63
metabolism
-0.62
worthiness
-0.60
WARD
-0.60
Decay
-0.60
çĦ
-0.59
POSITIVE LOGITS
quist
1.60
gren
1.02
acion
0.90
ensen
0.89
thal
0.89
submar
0.85
inav
0.84
bilt
0.82
ophon
0.82
qv
0.82
Activations Density 0.006%