INDEX
Explanations
the word "Deb" or variations of it
instances of the term "debate."
New Auto-Interp
Negative Logits
meal
-0.90
faces
-0.72
face
-0.70
hetical
-0.69
istics
-0.68
Inquisitor
-0.65
hens
-0.64
AMI
-0.60
Cth
-0.59
hetically
-0.59
POSITIVE LOGITS
ilitation
1.07
rief
1.04
utable
1.03
orah
1.03
unker
1.00
ilit
0.99
ounced
0.98
ouncing
0.96
ilitating
0.92
uting
0.91
Activations Density 0.027%