INDEX
Explanations
words related to criticism or judgement
the word "der" in various contexts, suggesting a focus on the presence or repetition of this specific term
New Auto-Interp
Negative Logits
ODUCT
-0.69
hibit
-0.67
Dragonbound
-0.67
hetti
-0.67
Reviewer
-0.66
YA
-0.64
cellence
-0.63
yright
-0.63
Hawaiian
-0.63
Crash
-0.62
POSITIVE LOGITS
iving
1.09
isively
1.06
isive
0.91
ider
0.90
mal
0.85
ision
0.84
oder
0.80
icht
0.80
ivers
0.77
ftime
0.75
Activations Density 0.005%