INDEX
Explanations
phrases indicating additional information or details
references to updates and additional information
New Auto-Interp
Negative Logits
isa
-0.65
gall
-0.65
soever
-0.64
Murd
-0.61
pires
-0.61
imaru
-0.60
conn
-0.59
Steal
-0.58
inval
-0.58
testified
-0.58
POSITIVE LOGITS
sake
0.86
details
0.84
explanation
0.83
specifics
0.77
ummies
0.75
.):
0.73
purposes
0.73
clarification
0.72
inspiration
0.71
reasons
0.70
Activations Density 0.239%