INDEX
Explanations
occurrences of the word "der" in various contexts
New Auto-Interp
Negative Logits
antis
-0.71
liberties
-0.70
anwhile
-0.69
justice
-0.67
ees
-0.67
alian
-0.65
ioned
-0.63
apprehens
-0.63
ilton
-0.62
iao
-0.61
POSITIVE LOGITS
��
0.74
pri
0.67
��
0.66
Scor
0.64
pictured
0.62
certs
0.61
massive
0.61
�
0.60
qv
0.60
RH
0.59
Activations Density 0.017%