INDEX
Explanations
phrases related to human expressions and interactions
capital letters and punctuation, indicating emphasis or titles
New Auto-Interp
Negative Logits
formerly
-0.70
Presence
-0.70
BMC
-0.67
Proceedings
-0.65
occupation
-0.65
influential
-0.64
accessible
-0.63
senal
-0.63
effective
-0.63
NCT
-0.62
POSITIVE LOGITS
oooo
1.42
aaaa
1.29
mmmm
1.28
oooooooo
1.21
eeee
1.19
OOOO
1.18
EEEE
1.15
OOOOOOOO
1.15
mmm
1.14
ooo
1.12
Activations Density 0.346%