INDEX
Explanations
detailed statements outlining multiple reasons
content related to presenting reasons or justifications for a claim
New Auto-Interp
Negative Logits
heid
-0.65
film
-0.56
Jagu
-0.55
entin
-0.54
Falk
-0.53
seiz
-0.53
Presence
-0.53
istine
-0.52
Resistance
-0.52
settlers
-0.52
POSITIVE LOGITS
myriad
1.17
reasons
1.13
several
0.94
different
0.89
fundamental
0.88
things
0.87
ways
0.85
two
0.84
numerous
0.84
various
0.83
Activations Density 0.354%