INDEX
Explanations
repetitions of the word "this."
New Auto-Interp
Negative Logits
agi
-0.77
ignt
-0.77
anamo
-0.77
hess
-0.76
RD
-0.74
aneers
-0.73
ARS
-0.71
ickets
-0.70
unk
-0.68
oller
-0.68
POSITIVE LOGITS
week
1.16
weekend
1.03
year
1.03
latest
0.99
month
0.97
newest
0.94
century
0.89
morning
0.86
guy
0.85
decade
0.85
Activations Density 0.158%