INDEX
Explanations
occurrences of the word "over"
New Auto-Interp
Negative Logits
ly
-0.29
shan
-0.23
place
-0.22
eous
-0.19
wards
-0.19
whel
-0.19
icularly
-0.19
bben
-0.18
象
-0.18
ships
-0.17
POSITIVE LOGITS
hang
0.25
ture
0.25
tures
0.20
kill
0.19
age
0.18
heid
0.18
views
0.18
ha
0.18
iew
0.17
ature
0.17
Activations Density 0.030%