INDEX
Explanations
expressions indicating a lack of awareness or knowledge
New Auto-Interp
Negative Logits
rrggbb
-0.56
astify
-0.56
ffilm
-0.52
CrossRef
-0.50
femininos
-0.50
femininas
-0.49
transQ
-0.49
OrBuilder
-0.48
nakalista
-0.47
Билгалдахарш
-0.47
POSITIVE LOGITS
unaware
0.69
oblivious
0.59
unknowingly
0.52
ignorance
0.51
unwittingly
0.46
Ignorance
0.45
clueless
0.45
unsuspecting
0.43
ignorant
0.43
不知
0.42
Activations Density 0.459%