INDEX
Explanations
references to fugitives and runaways
fugitive, runaway
New Auto-Interp
Negative Logits
LabelTagHelper
-0.64
awtextra
-0.60
للاسماء
-0.59
houſe
-0.56
willy
-0.54
IntoConstraints
-0.54
Personendaten
-0.53
Ender
-0.52
zdy
-0.51
Grammar
-0.51
POSITIVE LOGITS
fug
2.16
Fug
2.05
Fug
2.00
fug
1.85
fugitive
1.84
fuga
1.16
runaway
0.76
fuge
0.72
escaped
0.69
escape
0.67
Activations Density 0.010%