INDEX
Explanations
phrases related to providing information or instructions to the user
references to the reader or listener
New Auto-Interp
Negative Logits
ftime
-0.79
Sabha
-0.75
Fried
-0.66
Tang
-0.66
ĸļ
-0.64
Ange
-0.64
Advent
-0.62
Course
-0.62
Agriculture
-0.62
adelphia
-0.61
POSITIVE LOGITS
guys
1.03
're
1.02
tub
0.96
know
0.89
RS
0.87
naughty
0.78
've
0.76
glimpse
0.74
decide
0.72
understand
0.71
Activations Density 0.079%