INDEX
Explanations
verbs or phrases expressing discovery, realization, or opinion
expressions of personal opinion or assessment
New Auto-Interp
Negative Logits
rolet
-0.75
isoft
-0.72
idium
-0.67
apeake
-0.66
chief
-0.63
PHOTOS
-0.63
IPS
-0.62
opausal
-0.61
vous
-0.60
icro
-0.60
POSITIVE LOGITS
myself
0.98
oneself
0.95
fault
0.94
yourself
0.89
ourselves
0.88
themselves
0.87
inspiration
0.86
himself
0.79
it
0.79
herself
0.79
Activations Density 0.054%