INDEX
Explanations
prepositions used to introduce a reason or explanation
the phrase "For" followed by a number or statement of information
New Auto-Interp
Negative Logits
ady
-0.65
undet
-0.61
SPONSORED
-0.59
unexplained
-0.57
credibility
-0.57
alive
-0.56
thought
-0.55
ogene
-0.54
azes
-0.54
kered
-0.54
POSITIVE LOGITS
For
2.68
For
1.89
To
1.69
Of
1.50
By
1.46
With
1.45
In
1.40
Until
1.40
From
1.40
Because
1.39
Activations Density 0.024%