INDEX
Explanations
expressions related to perception and understanding, particularly in social contexts
New Auto-Interp
Negative Logits
ery
-0.17
ÃĹ</
-0.15
plate
-0.15
spiel
-0.15
illet
-0.14
üç
-0.14
ookie
-0.14
íĸī
-0.14
atatype
-0.14
STR
-0.14
POSITIVE LOGITS
689
0.16
ãĥ³ãĥĦ
0.15
OMEM
0.14
.githubusercontent
0.14
-threat
0.13
592
0.13
istické
0.13
mol
0.13
Thur
0.13
Crus
0.13
Activations Density 0.038%