INDEX
Explanations
instances of the word "half" followed by a number indicating a proportion
New Auto-Interp
Negative Logits
concess
-0.64
ensis
-0.62
hran
-0.59
lict
-0.57
Reviewer
-0.55
berus
-0.55
anwhile
-0.55
andr
-0.55
destro
-0.54
kson
-0.54
POSITIVE LOGITS
heartedly
0.79
dozen
0.78
imet
0.69
çͰ
0.69
percent
0.67
wheel
0.65
of
0.65
azo
0.64
pipe
0.63
hearted
0.61
Activations Density 0.028%