INDEX
Explanations
the word "Interestingly" and similar expressions that introduce noteworthy information
New Auto-Interp
Negative Logits
voeten
-0.57
us
-0.55
tranquillo
-0.54
u
-0.53
ș
-0.50
sogget
-0.50
înc
-0.50
nomme
-0.50
table
-0.49
<bos>
-0.49
POSITIVE LOGITS
Efq
0.68
])+
0.64
Sigism
0.64
]));
0.63
quartered
0.62
zzarella
0.62
)");
0.62
])]
0.61
rootReducer
0.61
########.
0.61
Activations Density 0.007%