INDEX
Explanations
phrases that indicate a lack of experience or having never done something before
statements expressing a lack of prior experience or knowledge about something
New Auto-Interp
Negative Logits
States
-0.79
hol
-0.65
Rap
-0.63
heimer
-0.62
MG
-0.61
Pascal
-0.61
states
-0.60
mart
-0.60
Heads
-0.59
Pe
-0.59
POSITIVE LOGITS
been
1.07
theless
0.94
been
0.91
undergone
0.83
ĸļ
0.79
Been
0.78
EVER
0.77
tasted
0.77
izens
0.75
seen
0.74
Activations Density 0.051%