INDEX
Explanations
the word "Only" followed by a number
the word "Only" indicating a limitation or exclusivity in statements
New Auto-Interp
Negative Logits
charism
-0.80
insula
-0.72
multipl
-0.64
senal
-0.63
shenan
-0.63
arted
-0.62
dict
-0.61
sem
-0.61
raught
-0.60
ung
-0.59
POSITIVE LOGITS
ices
0.79
marginally
0.77
onso
0.76
ICES
0.72
incidentally
0.72
kidding
0.71
oor
0.70
accepts
0.70
thia
0.68
phies
0.67
Activations Density 0.055%