INDEX
Explanations
instances where someone is being criticized or blamed for something
occurrences of the word "for" in various contexts
New Auto-Interp
Negative Logits
edin
-0.87
LAB
-0.81
fleet
-0.77
OTAL
-0.77
along
-0.76
atl
-0.74
awaits
-0.74
nin
-0.74
oct
-0.72
hess
-0.72
POSITIVE LOGITS
centuries
0.94
geries
0.94
gotten
0.93
gery
0.92
example
0.90
daring
0.89
inaction
0.88
decades
0.85
bidden
0.84
awhile
0.83
Activations Density 0.142%