INDEX
Explanations
phrases indicating significance or importance
instances of the verb "be" in various contexts
New Auto-Interp
Negative Logits
odd
-0.77
't
-0.65
riott
-0.65
Heights
-0.63
now
-0.63
NOW
-0.62
currently
-0.59
arta
-0.59
Mans
-0.57
Franch
-0.57
POSITIVE LOGITS
judged
0.99
eaten
0.95
able
0.95
remembered
0.94
rewarded
0.93
replaced
0.90
punished
0.90
fall
0.88
greeted
0.86
sacrificed
0.86
Activations Density 0.179%