INDEX
Explanations
No Explanations Found
New Auto-Interp
Head Attr Weights
0:0.08
1:0.09
2:0.08
3:0.09
4:0.08
5:0.07
6:0.09
7:0.09
8:0.07
9:0.07
10:0.06
11:0.08
Negative Logits
ulence
-1.96
icka
-1.74
film
-1.73
leck
-1.70
ovie
-1.68
orne
-1.67
raint
-1.64
audio
-1.64
WAR
-1.62
ulent
-1.61
POSITIVE LOGITS
���
2.33
��
1.76
xit
1.61
Drivers
1.58
��
1.55
Guilty
1.54
Discrimination
1.51
Latvia
1.49
=~
1.48
ュ
1.46
Activations Density 0.000%
No Known Activations
This feature has no known activations.