INDEX
Explanations
mentions of close friends or companions
references to friends or social connections
New Auto-Interp
Negative Logits
Mandatory
-0.77
MED
-0.75
Rational
-0.73
externalActionCode
-0.68
Mist
-0.64
Refuge
-0.63
Quran
-0.63
Situation
-0.63
Psychology
-0.62
Bir
-0.62
POSITIVE LOGITS
pals
1.35
ies
0.91
opher
0.78
ervatives
0.78
hip
0.78
arious
0.76
eties
0.76
ersen
0.75
hips
0.75
ignt
0.75
Activations Density 0.013%