INDEX
Explanations
mentions of specific names or proper nouns
New Auto-Interp
Negative Logits
air
-0.19
usi
-0.16
reta
-0.16
DECLARE
-0.15
ammer
-0.14
خرÛĮد
-0.14
Ïģι
-0.14
/popper
-0.14
lernen
-0.14
@update
-0.14
POSITIVE LOGITS
inder
0.17
ohl
0.16
å°İ
0.15
idian
0.15
quel
0.15
ä¸ģ
0.15
pitch
0.15
ENTE
0.15
finder
0.14
olis
0.14
Activations Density 0.065%