INDEX
Explanations
references to the concept of knowledge or recognition
New Auto-Interp
Negative Logits
rana
-0.17
fts
-0.16
ares
-0.15
uros
-0.14
Stub
-0.14
nova
-0.13
ви
-0.13
itler
-0.13
ä½į
-0.13
orgh
-0.13
POSITIVE LOGITS
simply
0.29
popular
0.24
Simply
0.23
Simply
0.23
popular
0.20
s
0.20
simplement
0.20
familiar
0.19
col
0.19
inform
0.19
Activations Density 0.024%