INDEX
Explanations
activation phrases related to requesting a user's attention
references to personal or collective perspective, particularly the use of "me" and "us"
New Auto-Interp
Negative Logits
iaries
-0.61
suicides
-0.60
bidden
-0.59
_.
-0.56
ibrary
-0.55
aks
-0.54
ranch
-0.54
icion
-0.53
fragmentation
-0.52
hedon
-0.52
POSITIVE LOGITS
rina
0.65
adow
0.64
Widget
0.63
Thumbnails
0.60
img
0.59
azz
0.56
ocol
0.55
tle
0.55
borough
0.55
dam
0.54
Activations Density 0.019%