INDEX
Explanations
phrases related to personal stories and experiences
New Auto-Interp
Negative Logits
percentages
-0.72
ILCS
-0.69
concessions
-0.68
IST
-0.65
ancing
-0.64
essen
-0.63
scenarios
-0.62
*/(
-0.61
ance
-0.61
doses
-0.61
POSITIVE LOGITS
hers
1.59
ours
1.56
yours
1.46
mine
1.41
theirs
1.32
Mine
1.01
sorts
0.89
mire
0.80
Mine
0.76
irlf
0.70
Activations Density 0.098%