INDEX
Explanations
contractions
the phrase "It’s" followed by various contexts or statements
New Auto-Interp
Negative Logits
={-0.64
scope
-0.63
andise
-0.63
rence
-0.62
INST
-0.62
ESE
-0.60
igraph
-0.59
Deaths
-0.59
umbnail
-0.59
911
-0.59
POSITIVE LOGITS
gonna
1.10
gotta
1.09
been
0.99
unclear
0.98
got
0.91
impossible
0.88
gotten
0.87
easy
0.86
worth
0.84
easier
0.83
Activations Density 0.075%