INDEX
Explanations
occurrences of the word "the."
New Auto-Interp
Negative Logits
obligations
-0.70
Lies
-0.65
Doctrine
-0.59
76561
-0.59
ierre
-0.59
cyclopedia
-0.58
Achievements
-0.57
existence
-0.56
Directive
-0.56
anni
-0.56
POSITIVE LOGITS
verge
0.74
fence
0.68
brink
0.67
helm
0.67
mend
0.64
ularity
0.64
chopping
0.63
Verge
0.61
sidelines
0.60
saddened
0.59
Activations Density 0.041%