INDEX
Explanations
references to dogs
references to dogs
New Auto-Interp
Negative Logits
éĹĺ
-0.83
DERR
-0.79
esson
-0.75
oulos
-0.74
Edison
-0.73
artz
-0.73
erences
-0.72
farious
-0.70
ISTER
-0.67
ORN
-0.66
POSITIVE LOGITS
patch
1.08
barking
1.03
fighting
1.02
meat
1.00
fight
0.99
gie
0.96
fights
0.94
fighter
0.93
matic
0.93
matically
0.92
Activations Density 0.030%