INDEX
Explanations
instances of the word "review" and its variations
New Auto-Interp
Negative Logits
htub
-0.19
arr
-0.17
fol
-0.16
unter
-0.16
gow
-0.16
abyrin
-0.15
cht
-0.15
ook
-0.14
quires
-0.14
geber
-0.14
POSITIVE LOGITS
able
0.25
ees
0.24
ee
0.22
ers
0.20
ingly
0.19
ables
0.18
nger
0.17
çİĩ
0.17
avar
0.17
ABLE
0.17
Activations Density 0.029%