INDEX
Explanations
repeated instances of the word "that."
New Auto-Interp
Negative Logits
loop
-0.58
Pitts
-0.52
otro
-0.52
ensk
-0.51
BOS
-0.50
tagHelperRunner
-0.50
stype
-0.50
lue
-0.49
volks
-0.48
instant
-0.48
POSITIVE LOGITS
bahawa
0.56
efectivamente
0.55
ormais
0.55
bahwa
0.54
puissent
0.52
mães
0.52
embarazadas
0.51
avulla
0.50
kwamba
0.48
chociaż
0.48
Activations Density 0.373%