INDEX
Explanations
negations and references to the absence of something
New Auto-Interp
Negative Logits
fast
-0.39
most
-0.38
Tu
-0.36
ss
-0.35
fast
-0.35
so
-0.34
damit
-0.34
Dec
-0.33
are
-0.33
Ai
-0.33
POSITIVE LOGITS
0.72
IsContent
0.64
OGND
0.63
舺
0.63
تضيفلها
0.62
GEBURTS
0.60
transfieras
0.59
новниш
0.58
featureID
0.57
мәкал
0.57
Activations Density 0.138%