INDEX
Explanations
phrases related to expressing opinions or giving speeches
New Auto-Interp
Negative Logits
Written
-0.71
WATCHED
-0.69
accessed
-0.65
>>>>>>>>
-0.64
Modified
-0.63
idav
-0.60
Rollins
-0.59
nikov
-0.59
Edited
-0.59
cream
-0.58
POSITIVE LOGITS
irlf
0.68
ilk
0.63
order
0.61
utical
0.61
predecessors
0.61
sum
0.61
romy
0.60
govern
0.60
stride
0.59
ngth
0.59
Activations Density 0.272%