INDEX
Explanations
honesty and honor related words
New Auto-Interp
Negative Logits
long
0.78
ISA
0.69
crowds
0.66
dép
0.65
LONG
0.65
citizen
0.64
symptom
0.64
metam
0.64
stellar
0.63
MSCI
0.63
POSITIVE LOGITS
oring
1.70
ored
1.56
orable
1.52
ors
1.50
ouring
1.42
oured
1.42
ores
1.26
oration
1.26
ORS
1.24
orar
1.24
Activations Density 0.026%