INDEX
Explanations
instances where someone is pointing to something
phrases that include the word "to"
New Auto-Interp
Negative Logits
soever
-0.93
mare
-0.81
lance
-0.78
bit
-0.74
worth
-0.74
stress
-0.74
tails
-0.71
itialized
-0.71
MAS
-0.71
mask
-0.70
POSITIVE LOGITS
similarities
0.76
inconsistencies
0.69
anecdotal
0.69
finger
0.65
ample
0.63
clues
0.63
preced
0.62
warnings
0.62
successes
0.62
Genocide
0.62
Activations Density 0.063%