INDEX
Explanations
phrases related to actions or behaviors of other people
references to actions and permissions involving others
New Auto-Interp
Negative Logits
Playoff
-0.76
enegger
-0.69
Unle
-0.67
irst
-0.67
finale
-0.65
Tycoon
-0.64
Notting
-0.64
Highlights
-0.62
Appropri
-0.61
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.61
POSITIVE LOGITS
worldly
0.81
ioch
0.80
iris
0.78
inois
0.75
than
0.75
faiths
0.75
expend
0.72
aband
0.70
perspectives
0.70
besides
0.70
Activations Density 0.450%