INDEX
Explanations
instances of the word "this" used multiple times
New Auto-Interp
Negative Logits
hess
-0.74
rics
-0.72
rikes
-0.66
rior
-0.64
gans
-0.63
eer
-0.63
amate
-0.63
ilitarian
-0.62
adle
-0.61
oller
-0.60
POSITIVE LOGITS
week
1.44
year
1.31
morning
1.25
month
1.23
weekend
1.19
afternoon
1.17
semester
1.16
evening
1.03
season
1.01
summer
0.99
Activations Density 0.070%